A Scenario-Generic Neural Machine Translation Data Augmentation Method
نویسندگان
چکیده
Amid the rapid advancement of neural machine translation, challenge data sparsity has been a major obstacle. To address this issue, study proposes general augmentation technique for various scenarios. It examines predicament parallel corpora diversity and high quality in both rich- low-resource settings, integrates low-frequency word substitution method reverse translation approach complementary benefits. Additionally, improves pseudo-parallel corpus generated by substituting words includes grammar error correction module to reduce grammatical errors The experimental are partitioned into scenarios at 10:1 ratio. verifies necessity pseudo-corpus Models methods chosen from backbone network related literature comparative experiments. findings demonstrate that proposed is suitable effective enhancing training improve performance tasks.
منابع مشابه
Data Augmentation for Low-Resource Neural Machine Translation
The quality of a Neural Machine Translation system depends substantially on the availability of sizable parallel corpora. For low-resource language pairs this is not the case, resulting in poor translation quality. Inspired by work in computer vision, we propose a novel data augmentation approach that targets low-frequency words by generating new sentence pairs containing rare words in new, syn...
متن کاملNeural Machine Translation Training in a Multi-Domain Scenario
In this paper, we explore alternative ways to train a neural machine translation system in a multi-domain scenario. We investigate data concatenation (with fine tuning), model stacking (multi-level fine tuning), data selection and weighted ensemble. We evaluate these methods based on three criteria: i) translation quality, ii) training time, and iii) robustness towards out-of-domain tests. Our ...
متن کاملEvaluating Machine Translation in a Usage Scenario
In this document we report on a user-scenario-based evaluation aiming at assessing the performance of machine translation (MT) systems in a real context of use. We describe a sequel of experiments that has been performed to estimate the usefulness of MT and to test if improvements of MT technology lead to better performance in the usage scenario. One goal is to find the best methodology for eva...
متن کاملDynamic Data Selection for Neural Machine Translation
Intelligent selection of training data has proven a successful technique to simultaneously increase training efficiency and translation performance for phrase-based machine translation (PBMT). With the recent increase in popularity of neural machine translation (NMT), we explore in this paper to what extent and how NMT can also benefit from data selection. While state-of-the-art data selection ...
متن کاملNeural Name Translation Improves Neural Machine Translation
In order to control computational complexity, neural machine translation (NMT) systems convert all rare words outside the vocabulary into a single unk symbol. Previous solution (Luong et al., 2015) resorts to use multiple numbered unks to learn the correspondence between source and target rare words. However, testing words unseen in the training corpus cannot be handled by this method. And it a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Electronics
سال: 2023
ISSN: ['2079-9292']
DOI: https://doi.org/10.3390/electronics12102320